Statistical inference asks: "Given this data, what are the most likely underlying parameters?" This slide bridges that question with Convex Optimization. We transform the probabilistic notion of likelihood into a structured program, showing that under conditions of log-concavity, finding the best estimate is equivalent to solving a convex optimization problem.
The Likelihood Framework
The likelihood function is the probability distribution $p_x(y)$ considered as a function of the parameter $x$ for a fixed observed sample $y$. To estimate $x$, we employ Maximum Likelihood (ML) estimation: choosing the value that makes the observed data most probable.
$$\hat{x}_{ml} = \text{argmax}_x p_x(y) = \text{argmax}_x l(x)$$
For computational efficiency, we use the log-likelihood function, $l(x) = \log p_x(y)$. Because the logarithm is a monotonically increasing function, it preserves the location of the maximum while turning products (from independent observations) into easy-to-manage sums.
The MLE Optimization Program (7.1)
We formalize the estimation as a mathematical program:
This program is a convex optimization problem if:
- The log-likelihood function $l$ is concave for each value of $y$.
- The feasible set $C$ (prior information) is described by linear equality and convex inequality constraints.
Integrating Constraints and Priors
ML estimation requires redefining $p_x(y)$ to be zero for $x \notin C$ to explicitly impose physical or prior constraints. In the optimization space, this means the log-likelihood function is assigned the value $-\infty$ for parameters $x$ that violate these constraints, effectively creating an impassable barrier for the optimizer.